AITopics | offline reinforcement

Towards Instance-Optimal Offline Reinforcement Learning with Pessimism

Neural Information Processing SystemsApr-25-2026, 01:53:36 GMT

We study the offline reinforcement learning (offline RL) problem, where the goal is to learn a reward-maximizing policy in an unknown Markov Decision Process (MDP) using the data coming from a policy µ. In particular, we consider the sample complexity problems of offline RL for finite-horizon MDPs. Prior works study this problem based on different data-coverage assumptions, and their learning guarantees are expressed by the covering coefficients which lack the explicit characterization of system quantities.

artificial intelligence, machine learning, reinforcement learning, (12 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)

Add feedback

Rethinking Optimal Transport in Offline Reinforcement Learning

Neural Information Processing SystemsMar-22-2026, 17:32:56 GMT

We propose a novel algorithm for offline reinforcement learning using optimal transport. Typically, in offline reinforcement learning, the data is provided by various experts and some of them can be sub-optimal. To extract an efficient policy, it is necessary to \emph{stitch} the best behaviors from the dataset. To address this problem, we rethink offline reinforcement learning as an optimal transportation problem. And based on this, we present an algorithm that aims to find a policy that maps states to a \emph{partial} distribution of the best expert actions for each given state. We evaluate the performance of our algorithm on continuous control problems from the D4RL suite and demonstrate improvements over existing methods.

artificial intelligence, machine learning, reinforcement learning, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.81)

Add feedback

82240d93542b74d0c4fdffca39cb779f-Paper-Conference.pdf

Neural Information Processing SystemsFeb-16-2026, 05:19:26 GMT

machine learning, reinforcement, reinforcement learning, (18 more...)

Neural Information Processing Systems

Country:

Europe > Denmark > Southern Denmark (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Add feedback

SMPL: SimulatedIndustrialManufacturingand ProcessControlLearningEnvironments

Neural Information Processing SystemsFeb-11-2026, 06:37:05 GMT

Our goal is to bridge the gap between deep reinforcement learning research and industrial manufacturing by creating simulation environments that model real-world factories.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > Canada > Ontario > Toronto (0.04)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.94)
Leisure & Entertainment > Games > Computer Games (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

A Author Statement 506 The authors of this work would like to state that we bear full responsibility for any potential violation

Neural Information Processing SystemsFeb-11-2026, 05:41:07 GMT

Table 3 presents the details of datasets in HoK1v1 task. Spells set to frenzy . Generally, a level of "1" is used for datasets with the "norm" prefix, while a level This distinction indicates varying levels of difficulty. In the Generalization category, "norm_general" and "hard_general," have their corresponding datasets. For example, to sample the "norm_general" dataset, we let the level-1 model fight with level-0, level-542 For example, in the "norm_hero_general" experiment, we directly use the model trained on "norm_medium" dataset only contains the fixed default hero "luban."

artificial intelligence, machine learning, reinforcement learning, (18 more...)

Neural Information Processing Systems

Industry: Law (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.47)

Add feedback

119b45b5c2020d6bc9bca1e42826a2b3-Paper-Conference.pdf

Neural Information Processing SystemsFeb-8-2026, 02:47:55 GMT

Despite its potential, offline RL faces twosignificant challenges that impact its performance.

machine learning, pas, reinforcement learning, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > New York > Erie County > Buffalo (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Florida > Orange County > Orlando (0.04)

Genre: Research Report (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.47)

Add feedback

212ab20dbdf4191cbcdcf015511783f4-Paper.pdf

Neural Information Processing SystemsFeb-7-2026, 19:45:44 GMT

assumption, international conference, reinforcement, (11 more...)

Neural Information Processing Systems

Country: Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Data Science (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.69)

Add feedback

Adversarial Fine-tuning in Offline-to-Online Reinforcement Learning for Robust Robot Control

Ayabe, Shingo, Kera, Hiroshi, Kawamoto, Kazuhiko

arXiv.org Artificial IntelligenceOct-16-2025

Offline reinforcement learning enables sample-efficient policy acquisition without risky online interaction, yet policies trained on static datasets remain brittle under action-space perturbations such as actuator faults. This study introduces an offline-to-online framework that trains policies on clean data and then performs adversarial fine-tuning, where perturbations are injected into executed actions to induce compensatory behavior and improve resilience. A performance-aware curriculum further adjusts the perturbation probability during training via an exponential-moving-average signal, balancing robustness and stability throughout the learning process. Experiments on continuous-control locomotion tasks demonstrate that the proposed method consistently improves robustness over offline-only baselines and converges faster than training from scratch. Matching the fine-tuning and evaluation conditions yields the strongest robustness to action-space perturbations, while the adaptive curriculum strategy mitigates the degradation of nominal performance observed with the linear curriculum strategy. Overall, the results show that adversarial fine-tuning enables adaptive and robust control under uncertain environments, bridging the gap between offline efficiency and online adaptability.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2510.13358

Country: Asia > Japan (0.14)

Genre: Research Report > New Finding (0.48)

Industry:

Education > Educational Setting (0.48)
Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Deterministic Uncertainty Propagation for Improved Model-Based Offline Reinforcement Learning

Neural Information Processing SystemsOct-10-2025, 07:44:20 GMT

We present a theoretical result demonstrating the strong dependency of suboptimality on the number of Monte Carlo samples taken per Bellman target calculation.

dataset, offline reinforcement, reinforcement, (14 more...)

Neural Information Processing Systems

Country: